Hybrid narrative and categorical strategies for interactive and dynamic video presentation generation
نویسندگان
چکیده
There are a number of different approaches for automatically selecting video clips from a video database and sequencing them into meaningful presentations for viewers. The video database represents a multidimensional video hyperspace, and the sequencing algorithms function as (interactive) dynamic linking and path generation techniques within this hyperspace. Sequencing has been based upon either a narrative or a categorical model of video form. Each of these forms has its respective advantages and disadvantages, and varying suitability for different applications. The two primary forms may also be combined into several hybrid forms, both at the same level and at different levels of the syntactic composition of video sequences, to provide more options for authoring interactive dynamic video productions. Narrative, categorical, and hybrid sequence generation strategies can be applied to a variety of media modalities, including the automated generation of behaviour within virtual environments and computer animations INTRODUCTION Adaptive video presentation generation involves the creation of user and task specific video presentations from a database of highly recombinable video components. This supports the creation of video programs tuned to viewer needs and preferences, and encourages a potentially high degree of reuse of the video clips in the underlying database. Systems for adaptive video presentation generation have been based upon either a continuity-edited narrative model of video form (1,2), or a categorical model (3,4,5). Each of these methods relies upon a specific model of video syntax, and it is in the terms of that syntax that sequences are explicitly assembled by an algorithm. Each syntax model is a model of how meanings of a particular kind are created by conjoining subsequences having specific independent meanings. An algorithm that dynamically creates new meanings by clip juxtaposition requires both rules for how that meaning is created from particular clip meanings, and representations of the meanings associated with the clips in the database. Conjoined video clips can have additional meanings (to authors or viewers) encoded in the perceptual structure and interpretation of the audiovisual data, but the sequencing algorithms can only directly process meanings that are explicitly and symbolically represented within the system. Hence perceived meaning is a function of the algorithm(s) used + the audiovisual content of subsequences + the symbolic representations associated with subsequences. This creates a complex authoring task. Very simple (and possibly automatically derived, low level) descriptors might be used to create interesting presentations if the underlying clips are carefully designed. On the other hand, more complex descriptors can provide more explicit control of the meanings of presentations, but then the authorship of the symbolic descriptions becomes a significant creative task. Combined sequencing strategies are of interest for creating a richer syntactic structure for video presentations, and also to provide alternative methods of sequence or subsequence generation when the available video cannot satisfy the sequencing requirements of a single technique. This paper presents two distinct models of video syntax, namely narrative and categorical forms, and then describes approaches for automated sequence generation for these forms. The discussion is particularly concerned with creating coherent and meaningful video presentations by conjoining previously assembled and fully composited video clips or clip subsequences into linear presentation sequences; real-time compositing and overlaying of independently stored audio and visual data is not considered (6). The paper goes on to describe a number of strategies by which narrative and categorical sequence generation may be combined. Combining the strategies provides authors of the hypermedia space with a richer language for expressing the combinatorial potential of discrete video clips, and provides viewers of generated presentations with a correspondingly richer potential for interaction during presentation generation processes. MODELS OF SYNTACTIC FORM FOR FILM AND VIDEO Different theorists focus on different qualities that characterise narrative (8), and the meaning of the term “narrative” varies from narrow interpretations involving strong spatio-temporal continuity to very broad interpretations in terms of the overall formal, rhetorical, or thematic coherence of a production. Narrative in a broad sense has been the goal of numerous research projects dealing with diverse media, from text (e.g. 9) to interactive 3D systems (e.g. 10) and video. When narrative is understood in the specialised sense of causally interconnected actions and events, it is useful to define the following alternative, nonnarrative forms for the organization of cinematic material (derived from 11): Categorical films use subjects or categories as a basis for their syntactic organisation, typically basing each segment of the film on one category or subcategory. Common examples of stereotyped categorical films include lifestyle and gardening programs, travelogues, and sporting programs. Rhetorical films present an argument and lay out evidence to support it, with the aim of persuading the audience to hold a particular opinion or belief. Common examples of rhetorical films are television commercials. Narrative and rhetorical film forms are both distinguished from categorical films by their creation of new meanings by the sequential association of initially distinct video sequences. In contrast to this, any basic video component in a categorical film can represent a designated categorical meaning expressed in an annotation irrespectively of what precedes or follows it. A hierarchy of categorical meanings can define the overall form of a categorical film, but any individual subsequence within the film will have the same categorical meaning that it has within the overall sequence. Each of these forms represents a different (partially codified) syntactic structure for film sequences. In most real films, the forms apply at multiple levels of film structure, a given film sequence may involve multiple forms at the same level, and multiple forms may occur at different levels. These formal models must also be regarded as greatly simplified. However, the simplifications support convenient algorithmic interpretations in the context of interactive video sequencing: computational techniques can be defined for generating cinematic presentations from databases of video material based upon these simple formal models. Properties of coherent cinematic productions that are not explicitly addressed by these models must be addressed by careful composition of the underlying video data. A CATEGORICAL VIDEO SEQUENCING ALGORITHM The CSIRO FRAMES project has developed a categorical system for dynamic video sequence synthesis (5). Dynamic presentation synthesis redefines the concept of a video as a presentation instance from a large set of potential instances represented by the underlying database of video clips. Such a presentation instance can be regarded as a virtual video, in the sense that a video presentation is perceived as a traditional linear video presentation, but there is no fixed representation of any predefined presentation order of video clips in the system. The generation of dynamic virtual videos in the FRAMES system is based upon annotations of stored video, together with a specification of the videos that are to be created, and queries embedded within specifications expressed using descriptors common to the content models. Video annotations are based upon a multi-level model of video semantics (12,13). Once video components have annotations generated for them, the annotations are stored in a database. The high level structure of a virtual video program is expressed in a virtual video prescription that can incorporate direct references to specific video components, parametric queries based upon exact or approximate matching of annotations to a query expression, and specifications that initiate the generation of a categorical chain of video content. The generation of video sequences by categorical chaining was first demonstrated in the MIT Automatist system (3,4). The FRAMES prototype extends this concept with the development of a multi-level semantic model for video, a flexible specification language and chaining algorithm, and a weighting mechanism to determine the breadth and depth of semantic categories expressed by a presentation. The FRAMES association specification language supports the specification of annotation types, initial values, soft constraints, weights, and termination conditions for the generation of a categorical video sequence. Associative chaining in the FRAMES system is a method of generating video sequences based upon patterns of similarity and dissimilarity in annotations. Chaining starts with specific parameters that are progressively substituted as the chain develops. At each step of associative chaining, the video component selected for presentation at the next step is the component having annotations that most match the association specification when parameterised using values from the annotations attached to the video segment presented at the current step. The high-level algorithm for associative chaining is: 1. initialise the current state description according to the association specification. The current state description includes: • the specification of annotation types that will be matched in the chaining process, • current values for those types (including NULL values when initial values are not explicitly given), • conditions and constraints upon the types and values of a condition, and • weights indicating the significance of particular statements in a specification 2. Generate a ranked list of video sequences matching the current state description. 3. If no video sequence in the ranked list has a rank > a specified minimum rank, go to 7. 4. Replace the current state description using annotation values from the most highly ranked matching video component: this becomes the new current state description. 5. Output the associated video component identification for the new current state description to the media server. 6. If the termination condition (specified as a play length, number of items, or associative weight threshold) is not yet satisfied, go back to step 2. 7. End. This algorithm is illustrated by the flow chart shown in Figure 1. Since associative matching is conducted progressively against annotations associated with each successive video component, paths may evolve significantly away from the annotations that match the initial specification. This algorithm has been implemented in the FRAMES demonstrator. Specific filmic structures and forms can be generated in FRAMES by using particular annotations, association criteria and constraints. In this way the sequencing mechanisms remain generic, with emphasis shifting to the authoring of metamodels, annotations, and specifications for the creation of specific types of dynamic virtual video productions. The basic form created explicitly by the chaining engine is categorical. The data model associates typed annotations with video segments. Annotation types can be created that represent category types, and the annotations themselves can be category names. An associative chain is initiated by sending an association specification to the association engine. The specification includes the category types to chain on, as well as initial category values and possible constraints upon values. The rate at which the categories change within each type is determined by the specified weighting attached to the type: the higher the positive weighting, the more slowly the categories will change within that type, while the more negative the weighting, the faster the categories will change. Hence for n category types, the association engine moves through a video annotation search space of n dimensions. Figure 1. Association Engine Flow Chart. The quality of a video presentation generated by the association engine depends crucially upon the annotation space design (i.e. the category types, specific categories, and the associations created between categories and video clips), and upon the design of the video clips sequenced by the algorithm (5). It is particularly important for this method of video sequencing that the start and end segments of video clips are compatible with the meanings that the system author wishes to convey by their conjunction. In other words, synthesizing a linear video presentation requires a rhetoric of arrival and departure (14), i.e. cues that make links between hypermedia components meaningful and coherent from the perspective of the viewer traversing the links. In the case of the linear presentation order of video clips, it may be more appropriate to refer to a rhetoric of montage, referring to the system of semiotic codes used to ensure that transitions between clips are meaningful and coherent within the context of the production as a whole. For categorical productions, this may mean avoiding expectations of narrative continuity. For narrative productions, rules must be followed to satisfy expectations of the continuity of action between cuts, and ensuring that discontinuities convey intended meanings. A NARRATIVE VIDEO SEQUENCING ALGORITHM As mentioned above, continuity-edited narrative film is concerned with the creation of a pattern of causeeffect relationships among the diegetic events, actions, and situations represented by a film. Basic video components must also be selected and arranged so that the spectator appreciates the intended themes of a production, which are conveyed not just by the presentation of a causally interconnected sequence of events, but also by how those events are audio-visually represented. The computational model for narrative construction is based upon film theoretical analyses of narrative, and film semiotics (15,16,17,18,19,20,21). Narrative in general is about constructing stories, where a story is a psychological entity that refers to mental or conceptual objects such as themes, goals, events or actions. The dynamics within plot construction are twofold. On one hand, the intentions of the narrator must be achieved, and this relies on communication strategies between narrator and receiver organised around surface structures (expression) initialise state description association specification generate ranked list update state description output component ID best component > minimum?
منابع مشابه
S-MADE: Interactive Storytelling Architecture through Goal Execution and Decomposition
Interactive storytelling, emerging with virtual reality technology, has attracted a lot of research interests in recent years. In order to bridge the gap between story generation and story presentation, some hybrid narrative models are raised. In this paper, we propose a new hybrid interactive storytelling architecture S-MADE. It combines the story plot generation and character performance thro...
متن کاملModeling User Knowledge with Dynamic Bayesian Networks in Interactive Narrative Environments
Recent years have seen a growing interest in interactive narrative systems that dynamically adapt story experiences in response to users’ actions, preferences, and goals. However, relatively little empirical work has investigated runtime models of user knowledge for informing interactive narrative adaptations. User knowledge about plot scenarios, story environments, and interaction strategies i...
متن کاملFocalization in 3D Video Games
This paper investigates Bal’s concept of focalization for 3D video games. First, the argument traces focalization in the historical development of camera strategies in 3D video games. It highlights the detachment of the camera into an own interactive operator. Then, it exemplifies the visual focalization in video games using two case studies. In the following, it looks at possible problems and ...
متن کاملDigital storytelling with dinah: dynamic, interactive, narrative authoring heuristic
We present a novel method for defining and assembling interactive narrative while operating under an optimized constraint-based system. DINAH is a dynamic, interactive, narrative authoring heuristic that constructs textual stories from a database of short story components. Similar to video game techniques that blend short motion capture segments to construct interactive graphical characters, DI...
متن کاملAn Interactive Data Visualisation Approach for Next Generation Presentation Tools - Towards Rich Presentation-based Data Exploration and Storytelling
Existing research in the field of information visualisation has shown that interactive data exploration and storytelling can significantly improve the extraction and transfer of knowledge from raw data. Established visualisation techniques help viewers to strengthen their mental model and improve the understanding of the underlying data. However, these techniques are not yet manifested in slide...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- The New Review of Hypermedia and Multimedia
دوره 6 شماره
صفحات -
تاریخ انتشار 2000